{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[Datashader](http://datashader.org) renders data into regularly sampled arrays, a process known as [rasterization](https://en.wikipedia.org/wiki/Rasterisation), and then optionally converts that array into a viewable image.  In some cases, your data is *already* rasterized, such as data from imaging experiments, simulations on a regular grid, or other regularly sampled processes. Even so, the rasters you have already are not always the ones you need for a given purpose, having the wrong shape, range, or size to be suitable for overlaying with or comparing against other data, maps, and so on.\n",
    "\n",
    "Datashader provides fast methods for [\"regridding\"](https://climatedataguide.ucar.edu/climate-data-tools-and-analysis/regridding-overview)/[\"re-sampling\"](http://gisgeography.com/raster-resampling/)/\"re-rasterizing\" your gridded datasets, generating new rasters on demand that can be used together with those it generates for any other data types to implement complex cross-datatype analyses or visualizations.\n",
    "\n",
    "To get started, let's declare a small raster using Numpy and wrap it up as an [xarray](http://xarray.pydata.org) DataArray: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np, datashader as ds, xarray as xr\n",
    "from datashader import transfer_functions as tf, reductions as rd\n",
    "\n",
    "def f(x,y):\n",
    "    return np.cos((x**2+y**2)**2)\n",
    "\n",
    "def sample(fn, n=50, range_=(0.0,2.4)):\n",
    "    xs = ys = np.linspace(range_[0], range_[1], n)\n",
    "    x,y = np.meshgrid(xs, ys)\n",
    "    z   = fn(x,y)\n",
    "    return xr.DataArray(z, coords=[('y',ys), ('x',xs)])\n",
    "\n",
    "da = sample(f)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here we are declaring that the first dimension of the ``DataArray`` (the rows) is called ``y`` and corresponds to the indicated continuous coordinate values in the list ``ys``, and similarly for the second dimension (the columns) called ``x``.  The coords argument is optional, but without it the default integer indexing from Numpy would be used, which would not match how this data was generated (sampling over each of the ``ys``). Datashader only supports rectilinear, 2D or 3D ``DataArray``s, and so will not accept the additional dimensions or non-separable coordinate arrays that xarray allows.\n",
    "\n",
    "DataArrays of this type happen to be the format used for datashader's own rasterized output, so you can now immediately turn this array into an image using ``tf.shade()`` just as you would for points or lines rasterized by datashader:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tf.shade(da)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Interpolation (upsampling)\n",
    "\n",
    "So, what if we want a larger version?  We can do that by upsampling with either nearest-neighbor or bilinear interpolation (the default):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tf.Images(tf.shade(ds.Canvas().raster(da, interpolate='linear'),  name=\"linear interpolation (default)\"),\n",
    "          tf.shade(ds.Canvas().raster(da, interpolate='nearest'), name=\"nearest-neighbor interpolation\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you can see, linear interpolation works well for smoothly varying values, like those in the lower left of this function, but doesn't help much for regions close to or beyond the sampling limits of the original raster.\n",
    "\n",
    "We can choose whatever output size we like and sample the grid over any range we like, though of course there won't be any data in regions outside of the original raster grid: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tf.shade(ds.Canvas(plot_height=200, plot_width=600, x_range=(-2,5), y_range=(-0.1,0.4)).raster(da))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Aggregation (downsampling)\n",
    "\n",
    "The previous examples all use upsampling, from a smaller to a larger number of cells per unit distance in X or Y.  Downsampling works just as for points and lines in Datashader, supporting various aggregation functions.  These aggregation functions determine the result when more than one raster grid cell falls into a given pixel's region of the plane. To illustrate downsampling, let's first render a 500x500 version of the above 50x50 array:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "da2 = sample(f, n=500)\n",
    "tf.shade(da2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can see that the version we upsampled to this size from the 50x50 samples is similar to this, but this one is much smoother and more accurately represents the underlying mathematical function, because it has sufficient resolution to avoid aliasing in the high-frequency portions of this function (towards the upper right).  \n",
    "\n",
    "Now that we have this larger array, we can downsample it using a variety of aggregation functions:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cvs = ds.Canvas(plot_width=150, plot_height=150)\n",
    "tf.Images(tf.shade(cvs.raster(da2),                name=\"mean downsampling (default)\"),\n",
    "          tf.shade(cvs.raster(da2, agg=rd.min()),  name=\"min downsampling\"),\n",
    "          tf.shade(cvs.raster(da2, agg=rd.max()),  name=\"max downsampling\"),\n",
    "          tf.shade(cvs.raster(da2, agg=rd.mode()), name=\"mode downsampling\"),\n",
    "          tf.shade(cvs.raster(da2, agg=rd.std()),  name=\"std downsampling\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here the default downsampling function ``mean`` renders a faithful size-reduced version of the original, with all the raster grid points that overlap a given pixel being averaged to create the final pixel value.  The ``min`` and ``max`` aggregation functions take the minimum or maxiumum, respectively, of the values overlapping each pixel, and you can see here that the ``min`` version has larger light-blue regions towards the upper right (with each pixel reflecting the minimum of all grid cells it overlaps), while the ``max`` version has larger dark-blue regions towards the upper right. The ``mode`` version computes the most common value that overlaps this pixel (not very useful for floating-point data as here, but important for categorical data where ``mean`` would not be valid; in that case you can also use `first` or `last` to take the first or last value found for a given pixel). The ``std`` version reports the standard deviation of the grid cells in each pixel, which is low towards the lower left where the function is smooth, and increases towards the upper right, where the function value varies significantly per pixel (i.e., has many samples in the original grid with different values).\n",
    "\n",
    "The differences between min and max are clearer if we look at a regime where the function varies so much that it can only barely be faithfully be represented in a grid of this size:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "da3 = sample(f, n=500, range_=(0,3))\n",
    "tf.shade(da3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the upper right of this plot, you can see that the function is varying with such high frequency that any downsampled version will fail to represent it properly.  The aggregation functions in Datashader can help you see when that is happening:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cvs = ds.Canvas(plot_width=150, plot_height=150)\n",
    "tf.Images(tf.shade(cvs.raster(da3),               name=\"mean downsampling (default)\"),\n",
    "          tf.shade(cvs.raster(da3, agg=rd.min()), name=\"min downsampling\"),\n",
    "          tf.shade(cvs.raster(da3, agg=rd.max()), name=\"max downsampling\"),\n",
    "          tf.shade(cvs.raster(da3, agg=rd.mode()),name=\"mode downsampling\"),\n",
    "          tf.shade(cvs.raster(da3, agg=rd.std()), name=\"std downsampling\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here you can see that the ``mean`` downsampling looks like a good approximation to the original array, locally averaging the original function values in each portion of the array.  However, if you were to zoom in and adjust for contrast, you would be able to see some of the inevitable aliasing artifacts that occur for any such representation in a too-small array.  These aliasing effects are clearly visible in the ``min`` and ``max`` aggregation, because they keep the local minimum or maxiumum rather than averaging out the artifacts.  Comparing ``mean`` and ``min`` or ``max`` (or subtracting ``min`` from ``max``) can help find regions of the array that are being poorly represented in the current view.\n",
    "\n",
    "## Collections of raster data\n",
    "\n",
    "The above examples all use a single 2D DataArray.  Datashader can also use a 3D DataArray as long as which \"layer\" along this third dimension is wanted is specified:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def g(x, y, k=1):\n",
    "    return np.cos(k*((x**2+y**2)**2))\n",
    "\n",
    "ks = [1.1,3.3,33]\n",
    "xs = ys = np.linspace(0.0, 2.4, 200)\n",
    "y,k,x = np.meshgrid(ys, ks, xs, indexing='xy')\n",
    "\n",
    "da4 = xr.DataArray(g(x,y,k), coords=[('k',ks), ('y',ys), ('x',xs)])\n",
    "\n",
    "tf.Images(tf.shade(ds.Canvas().raster(da4, layer=1.1), name=\"k=1.1\"),\n",
    "          tf.shade(ds.Canvas().raster(da4, layer=3.3), name=\"k=3.3\"),\n",
    "          tf.shade(ds.Canvas().raster(da4, layer=33),  name=\"k=33\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here the factor \"k\" results in the function being evaluated at increasingly higher frequencies, eventually leading to complex [Moir&eacute; patterns](https://en.wikipedia.org/wiki/Moir%C3%A9_pattern) due to undersampling for k=33. 3D xarrays are useful for reading multi-layer image files, such as those from [xarray's rasterio-based support for reading from multilayer TIFFs](http://xarray.pydata.org/en/stable/io.html#rasterio).\n",
    "\n",
    "Similarly, Datashader can accept an xarray Dataset (a dictionary-like collection of aligned DataArrays), as long as the aggregation method specifies a suitable DataArray within the Dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def h(x,y): return np.sin((x**2+y**2)**2)\n",
    "def m(x,y): return np.exp(x+y)\n",
    "\n",
    "dd = xr.Dataset({'cos': sample(f, n=150), \n",
    "                 'sin': sample(h, n=150), \n",
    "                 'exp': sample(m, n=150)})\n",
    "\n",
    "tf.Images(tf.shade(ds.Canvas().raster(dd, agg=rd.mean('cos')), name='cos ((x^2+y^2)^2)'),\n",
    "          tf.shade(ds.Canvas().raster(dd, agg=rd.mean('sin')), name='sin ((x^2+y^2)^2)'),\n",
    "          tf.shade(ds.Canvas().raster(dd, agg=rd.mean('exp')), name='exp (x+y)'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Scaling up\n",
    "\n",
    "The above rasters are all relatively tiny, for illustration purposes, but Datashader's raster support is accelerated using multi-core machine-code generation from Numba, and can re-sample much larger rasters easily.  For instance, rendering the above function into a 10000x10000 raster takes a few seconds on a four-core machine using Numpy, but it can then be re-sampled using Datashader in under 0.1 sec:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%time da5 = sample(m, n=10000, range_=(0,3))\n",
    "%time tf.shade(cvs.raster(da5))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Interactivity\n",
    "\n",
    "In practice, it is usually a good idea to view your data interactively at many different zoom levels to see all the data available and to detect any sampling or aliasing issues, which you can do with Datashader as described in [3_Interactivity](3_Interactivity.ipynb).  For such cases, you can define both upsampling and downsampling methods at the same time; whichever one is needed for a given zoom level and range of the data will be applied.\n",
    "\n",
    "You can see Datashader rasters at work in the [Landsat](../topics/landsat.ipynb) example notebook, which also has examples of reading raster data from TIFF files using xarray and rasterio:\n",
    "\n",
    "<img src=\"../assets/images/landsat.png\" width=602 height=575>"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
